NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback

Wang, Yufei; Sun, Zhanyi; Zhang, Jesse; Xian, Zhou; Biyik, Erdem; Held, David; Erickson, Zackory (June 2024, arxiv.org)

Reward engineering has long been a challenge in Reinforcement Learning (RL) research, as it often requires extensive human effort and iterative processes of trial-and-error to design effective reward functions. In this paper, we propose RL-VLM-F, a method that automatically generates reward functions for agents to learn new tasks, using only a text description of the task goal and the agent's visual observations, by leveraging feedbacks from vision language foundation models (VLMs). The key to our approach is to query these models to give preferences over pairs of the agent's image observations based on the text description of the task goal, and then learn a reward function from the preference labels, rather than directly prompting these models to output a raw reward score, which can be noisy and inconsistent. We demonstrate that RL-VLM-F successfully produces effective rewards and policies across various domains - including classic control, as well as manipulation of rigid, articulated, and deformable objects - without the need for human supervision, outperforming prior methods that use large pretrained models for reward generation under the same assumptions.
more » « less
Full Text Available
Active Preference-Based Gaussian Process Regression for Reward Learning and Optimization

Biyik, Erdem; Huynh, Nicolas; Kochenderfer, Mykel; Sadigh, Dorsa (September 2023, The International Journal of Robotics Research (IJRR))

Full Text Available
Active Reward Learning from Online Preferences

Myers, Vivek; Biyik, Erdem; Sadigh, Dorsa (May 2023, International Conference on Robotics and Automation (ICRA))
Incentivizing Efficient Equilibria in Traffic Networks with Mixed Autonomy

https://doi.org/10.1109/TCNS.2021.3084045

Biyik, Erdem; Lazar, Daniel; Pedarsani, Ramtin; Sadigh, Dorsa (January 2021, IEEE Transactions on Control of Network Systems)
null (Ed.)
Full Text Available
Active Preference-Based Gaussian Process Regression for Reward Learning

https://doi.org/10.15607/rss.2020.xvi.041

Biyik, Erdem; Huynh, Nicolas; Kochenderfer, Mykel; Sadigh, Dorsa (July 2020, Robotics: Science and Systems)
null (Ed.)
Full Text Available
ROIAL: Region of Interest Active Learning for Characterizing Exoskeleton Gait Preference Landscapes

https://doi.org/10.1109/ICRA48506.2021.9560840

Li, Kejun; Tucker, Maegan; Biyik, Erdem; Novoseller, Ellen; Burdick, Joel W.; Sui, Yanan; Sadigh, Dorsa; Yue, Yisong; Ames, Aaron D. (May 2021, IEEE International Conference on Robotics and Automation (ICRA 2021))

Full Text Available
Multi-Agent Safe Planning with Gaussian Processes

Zhu, Zheqing; Biyik, Erdem; Sadigh, Dorsa (January 2020, Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems)
null (Ed.)
Full Text Available
Reinforcement Learning based Control of Imitative Policies for Near-Accident Driving

https://doi.org/10.15607/rss.2020.xvi.039

Cao, Zhangjie; Biyik, Erdem; Wang, Woodrow; Raventos, Allan; Gaidon, Adrien; Rosman, Guy; Sadigh, Dorsa (July 2020, Robotics: Science and Systems)
null (Ed.)
Full Text Available
The Green Choice: Learning and Influencing Human Decisions on Shared Roads

https://doi.org/10.1109/cdc40024.2019.9030169

Biyik, Erdem; Lazar, Daniel A.; Sadigh, Dorsa; Pedarsani, Ramtin (December 2019, IEEE Conference on Decision and Control (CDC))

Full Text Available
When Humans Aren't Optimal: Robots that Collaborate with Risk-Aware Humans

https://doi.org/10.1145/3319502.3374832

Kwon, Minae; Biyik, Erdem; Talati, Aditi; Bhasin, Karan; Losey, Dylan P.; Sadigh, Dorsa (March 2020, Human-Robot Interaction)
null (Ed.)
Full Text Available

« Prev Next »

Search for: All records